Goto

Collaborating Authors

 riesz representer


Penalized GMM Framework for Inference on Functionals of Nonparametric Instrumental Variable Estimators

Bakhitov, Edvard

arXiv.org Machine Learning

This paper develops a penalized GMM (PGMM) framework for automatic debiased inference on functionals of nonparametric instrumental variable estimators. We derive convergence rates for the PGMM estimator and provide conditions for root-n consistency and asymptotic normality of debiased functional estimates, covering both linear and nonlinear functionals. Monte Carlo experiments on average derivative show that the PGMM-based debiased estimator performs on par with the analytical debiased estimator that uses the known closed-form Riesz representer, achieving 90-96% coverage while the plug-in estimator falls below 5%. We apply our procedure to estimate mean own-price elasticities in a semiparametric demand model for differentiated products. Simulations confirm near-nominal coverage while the plug-in severely undercovers. Applied to IRI scanner data on carbonated beverages, debiased semiparametric estimates are approximately 20% more elastic compared to the logit benchmark, and debiasing corrections are heterogeneous across products, ranging from negligible to several times the standard error.


Two Approaches to Direct Estimation of Riesz Representers

Bruns-Smith, David

arXiv.org Machine Learning

The Riesz representer is a central object in semiparametric statistics and debiased/doubly-robust estimation. Two literatures in econometrics have highlighted the role for directly estimating Riesz representers: the automatic debiased machine learning literature (as in Chernozhukov et al., 2022b), and an independent literature on sieve methods for conditional moment models (as in Chen et al., 2014). These two literatures solve distinct optimization problems that in the population both have the Riesz representer as their solution. We show that with unregularized or ridge-regularized linear, sieve, or RKHS models, the two resulting estimators are numerically equivalent. However, for other regularization schemes such as the Lasso, or more general machine learning function classes including neural networks, the estimators are not necessarily equivalent. In the latter case, the Chen et al. (2014) formulation yields a novel constrained optimization problem for directly estimating Riesz representers with machine learning. Drawing on results from Birrell et al. (2022), we conjecture that this approach may offer statistical advantages at the cost of greater computational complexity.


Riesz Representer Fitting under Bregman Divergence: A Unified Framework for Debiased Machine Learning

Kato, Masahiro

arXiv.org Machine Learning

Estimating the Riesz representer is central to debiased machine learning for causal and structural parameter estimation. We propose generalized Riesz regression, a unified framework that estimates the Riesz representer by fitting a representer model via Bregman divergence minimization. This framework includes the squared loss and the Kullback--Leibler (KL) divergence as special cases: the former recovers Riesz regression, while the latter recovers tailored loss minimization. Under suitable model specifications, the dual problems correspond to covariate balancing, which we call automatic covariate balancing. Moreover, under the same specifications, outcome averages weighted by the estimated Riesz representer satisfy Neyman orthogonality even without estimating the regression function, a property we call automatic Neyman orthogonalization. This property not only reduces the estimation error of Neyman orthogonal scores but also clarifies a key distinction between debiased machine learning and targeted maximum likelihood estimation. Our framework can also be viewed as a generalization of density ratio fitting under Bregman divergences to Riesz representer estimation, and it applies beyond density ratio estimation. We provide convergence analyses for both reproducing kernel Hilbert space (RKHS) and neural network model classes. A Python package for generalized Riesz regression is available at https://github.com/MasaKat0/grr.


Automatic debiased machine learning and sensitivity analysis for sample selection models

Bjelac, Jakob, Chernozhukov, Victor, Klotz, Phil-Adrian, Kueck, Jannis, Schmitz, Theresa M. A.

arXiv.org Machine Learning

In this paper, we extend the Riesz representation framework to causal inference under sample selection, where both treatment assignment and outcome observability are non-random. Formulating the problem in terms of a Riesz representer enables stable estimation and a transparent decomposition of omitted variable bias into three interpretable components: a data-identified scale factor, outcome confounding strength, and selection confounding strength. For estimation, we employ the ForestRiesz estimator, which accounts for selective outcome observability while avoiding the instability associated with direct propensity score inversion. We assess finite-sample performance through a simulation study and show that conventional double machine learning approaches can be highly sensitive to tuning parameters due to their reliance on inverse probability weighting, whereas the ForestRiesz estimator delivers more stable performance by leveraging automatic debiased machine learning. In an empirical application to the gender wage gap in the U.S., we find that our ForestRiesz approach yields larger treatment effect estimates than a standard double machine learning approach, suggesting that ignoring sample selection leads to an underestimation of the gender wage gap. Sensitivity analysis indicates that implausibly strong unobserved confounding would be required to overturn our results. Overall, our approach provides a unified, robust, and computationally attractive framework for causal inference under sample selection.


ScoreMatchingRiesz: Auto-DML with Infinitesimal Classification

Kato, Masahiro

arXiv.org Machine Learning

This study proposes Riesz representer estimation methods based on score matching. The Riesz representer is a key component in debiased machine learning for constructing $\sqrt{n}$-consistent and efficient estimators in causal inference and structural parameter estimation. To estimate the Riesz representer, direct approaches have garnered attention, such as Riesz regression and the covariate balancing propensity score. These approaches can also be interpreted as variants of direct density ratio estimation (DRE) in several applications such as average treatment effect estimation. In DRE, it is well known that flexible models can easily overfit the observed data due to the estimand and the form of the loss function. To address this issue, recent work has proposed modeling the density ratio as a product of multiple intermediate density ratios and estimating it using score-matching techniques, which are often used in the diffusion model literature. We extend score-matching-based DRE methods to Riesz representer estimation. Our proposed method not only mitigates overfitting but also provides insights for causal inference by bridging marginal effects and average policy effects through time score functions.


Riesz Regression As Direct Density Ratio Estimation

Kato, Masahiro

arXiv.org Machine Learning

Riesz regression has garnered attention as a tool in debiased machine learning for causal and structural parameter estimation (Chernozhukov et al., 2021). This study shows that Riesz regression is closely related to direct density-ratio estimation (DRE) in important cases, including average treat- ment effect (ATE) estimation. Specifically, the idea and objective in Riesz regression coincide with the one in least-squares importance fitting (LSIF, Kanamori et al., 2009) in direct density-ratio estimation. While Riesz regression is general in the sense that it can be applied to Riesz representer estimation in a wide class of problems, the equivalence with DRE allows us to directly import exist- ing results in specific cases, including convergence-rate analyses, the selection of loss functions via Bregman-divergence minimization, and regularization techniques for flexible models, such as neural networks. Conversely, insights about the Riesz representer in debiased machine learning broaden the applications of direct density-ratio estimation methods. This paper consolidates our prior results in Kato (2025a) and Kato (2025b).


A Unified Theory for Causal Inference: Direct Debiased Machine Learning via Bregman-Riesz Regression

Kato, Masahiro

arXiv.org Machine Learning

This note introduces a unified theory for causal inference that integrates Riesz regression, covariate balancing, density-ratio estimation (DRE), targeted maximum likelihood estimation (TMLE), and the matching estimator in average treatment effect (ATE) estimation. In ATE estimation, the balancing weights and the regression functions of the outcome play important roles, where the balancing weights are referred to as the Riesz representer, bias-correction term, and clever covariates, depending on the context. Riesz regression, covariate balancing, DRE, and the matching estimator are methods for estimating the balancing weights, where Riesz regression is essentially equivalent to DRE in the ATE context, the matching estimator is a special case of DRE, and DRE is in a dual relationship with covariate balancing. TMLE is a method for constructing regression function estimators such that the leading bias term becomes zero. Nearest Neighbor Matching is equivalent to Least Squares Density Ratio Estimation and Riesz Regression.


Direct Debiased Machine Learning via Bregman Divergence Minimization

Kato, Masahiro

arXiv.org Machine Learning

We develop a direct debiased machine learning framework comprising Neyman targeted estimation and generalized Riesz regression. Our framework unifies Riesz regression for automatic debiased machine learning, covariate balancing, targeted maximum likelihood estimation (TMLE), and density-ratio estimation. In many problems involving causal effects or structural models, the parameters of interest depend on regression functions. Plugging regression functions estimated by machine learning methods into the identifying equations can yield poor performance because of first-stage bias. To reduce such bias, debiased machine learning employs Neyman orthogonal estimating equations. Debiased machine learning typically requires estimation of the Riesz representer and the regression function. For this problem, we develop a direct debiased machine learning framework with an end-to-end algorithm. We formulate estimation of the nuisance parameters, the regression function and the Riesz representer, as minimizing the discrepancy between Neyman orthogonal scores computed with known and unknown nuisance parameters, which we refer to as Neyman targeted estimation. Neyman targeted estimation includes Riesz representer estimation, and we measure discrepancies using the Bregman divergence. The Bregman divergence encompasses various loss functions as special cases, where the squared loss yields Riesz regression and the Kullback-Leibler divergence yields entropy balancing. We refer to this Riesz representer estimation as generalized Riesz regression. Neyman targeted estimation also yields TMLE as a special case for regression function estimation. Furthermore, for specific pairs of models and Riesz representer estimation methods, we can automatically obtain the covariate balancing property without explicitly solving the covariate balancing objective.


Nearest Neighbor Matching as Least Squares Density Ratio Estimation and Riesz Regression

Kato, Masahiro

arXiv.org Machine Learning

This study proves that Nearest Neighbor (NN) matching can be interpreted as an instance of Riesz regression for automatic debiased machine learning. Lin et al. (2023) shows that NN matching is an instance of density-ratio estimation with their new density-ratio estimator. Chernozhukov et al. (2024) develops Riesz regression for automatic debiased machine learning, which directly estimates the Riesz representer (or equivalently, the bias-correction term) by minimizing the mean squared error. In this study, we first prove that the density-ratio estimation method proposed in Lin et al. (2023) is essentially equivalent to Least-Squares Importance Fitting (LSIF) proposed in Kanamori et al. (2009) for direct density-ratio estimation. Furthermore, we derive Riesz regression using the LSIF framework. Based on these results, we derive NN matching from Riesz regression. This study is based on our work Kato (2025a) and Kato (2025b).


Learning density ratios in causal inference using Bregman-Riesz regression

Hines, Oliver J., Miles, Caleb H.

arXiv.org Machine Learning

The ratio of two probability density functions is a fundamental quantity that appears in many areas of statistics and machine learning, including causal inference, reinforcement learning, covariate shift, outlier detection, independence testing, importance sampling, and diffusion modeling. Naively estimating the numerator and denominator densities separately using, e.g., kernel density estimators, can lead to unstable performance and suffers from the curse of dimensionality as the number of covariates increases. For this reason, several methods have been developed for estimating the density ratio directly based on (a) Bregman divergences or (b) recasting the density ratio as the odds in a probabilistic classification model that predicts whether an observation is sampled from the numerator or denominator distribution. Additionally, the density ratio can be viewed as the Riesz representer of a continuous linear map, making it amenable to estimation via (c) minimization of the so-called Riesz loss, which was developed to learn the Riesz representer in the Riesz regression procedure in causal inference. In this paper we show that all three of these methods can be unified in a common framework, which we call Bregman-Riesz regression. We further show how data augmentation techniques can be used to apply density ratio learning methods to causal problems, where the numerator distribution typically represents an unobserved intervention. We show through simulations how the choice of Bregman divergence and data augmentation strategy can affect the performance of the resulting density ratio learner. A Python package is provided for researchers to apply Bregman-Riesz regression in practice using gradient boosting, neural networks, and kernel methods.